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Multichannel  Relative-Entropy  Spectrum  Analysis 


I.  INTRODUCTION 

We  examine  the  problem  of  estimating  power  spectra  and  cross  spectra  for  multiple  signals, 
given  selected  correlations  of  various  linear  combinations  of  the  signals,  and  given  an  inifis/  esti¬ 
mate  of  the  spectral  density  matrix.  We  present  a  method  that  produces  final  estimates  that  are 
consistent  with  the  given  correlation  information  and  otherwise  as  similar  as  possible  to  the  initial 
estimates  in  a  precise  information-theoretic  sense.  The  method  is  an  extension  of  the  Relative- 
Entropy  Spectrum  Analysis  (RESA)  of  Shore  [l]  and  of  the  Maximum-Entropy  Spectral  Analysis 
(MESA)  of  Burg  [2,  3j.  It  reduces  to  RESA  when  there  is  a  single  signal  and  to  MESA  when  the 
initial  estimate  is  flat. 

MEISA  starts  with  a  set  of  p  known  data  correlations.  It  then  estimates  a  probability  den¬ 
sity  for  the  signal  that  has  as  large  an  entropy  as  possible  (is  maximally  “flat”)  but  still  satisfies 
the  known  correlations.  Intuitively,  the  method  seeks  the  most  “conservative”  density  estimate 
that  would  explain  the  observed  data.  The  resulting  algorithm  fits  a  smooth  order  autoregres¬ 
sive  power-spectrum  model  to  the  known  correlations.  This  technique  gives  good,  high-resolution 
spectrum  estimates,  particularly  if  the  signal  either  is  sinusoidal  or  has  been  generated  by  an 
autoregressive  process  of  order  p  or  less. 

RESA  [l]  is  based  on  an  information-theoretic  derivation  that  is  quite  similar  to  that  of 
MESA,  except  that  it  incorporates  an  initial  spectnun  estimate.  This  prior  knowledge  can  often 
improve  the  spectrum  estimates  when  a  reliable  estimate  of  the  shape  of  the  overall  signal  spec¬ 
trum  is  available.  In  the  case  where  the  initial  spectral  density  estimate  is  flat,  RESA  reduces  to 
MESA. 

In  this  paper  we  derive  a  multichannel  RESA  method  that  estimates  the  joint  probability 
density  of  a  set  of  signals  given  correlations  of  various  linear  combinations  of  the  signal  and  given 
an  initial  estimate  of  the  signal  probability  densities.  The  estimator  was  briefly  presented  in  [4]. 
Our  basic  approach  is  similar  in  spirit  to  the  multisignal  spectrum-estimation  procedure  in  [5,  6], 
but  the  result  diflers  significantly  because  that  paper  not  only  assumed  that  the  initial 
probability-density  estimates  for  the  various  signals  were  independent,  but  in  effect  imposed  the 
same  condition  on  the  final  estimates  as  well.  We  show  that  if  this  assumption  is  not  made,  the 
resulting  final  estimates  are  in  fact  not  independent,  but  do  take  a  form  that  is  more  intuitively 
satisfying.  When  applied  to  the  case  of  estimating  the  power  spectra  and  cross  spectra  of  a  signal 
and  noise  given  selected  correlations  of  their  sum,  our  method  first  fits  a  smooth  power  spectrum 
model  of  the  signal  plus  noise  spectrum  to  the  given  correlations.  It  then  uses  a  smoothing 
Wiener-Hopf  filter  to  obtain  the  final  estimates  of  the  signal  and  the  noise  spectra.  This  Mul¬ 
tichannel  Relative-Entropy  Spectrum  Analysis  method  thus  represents  a  bridge  between  the  infor¬ 
mation  theoretic  methods  and  Bayesian  methods  for  spectrum  estimation  from  noisy  data. 

The  last  issue  we  consider  in  this  paper  is  treated  in  a  more  tentative  and  exploratory 
manner.  In  certain  filtering  applications  such  as  speech  enhancement,  relatively  good  estimates  of 
a  stationary  noise  background  can  be  found  during  quiet  periods  when  no  signal  is  present. 
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However,  the  signal  spectrum  may  be  changing  relatively  rapidly  so  that  good  initial  estimates  for 
this  spectrum  are  not  found  as  easily.  Unfortunately,  our  technique,  like  the  Bayesian  methods, 
req[uires  good  initial  estimates  of  both  the  signal  and  noise  spectra.  The  simplest  fix  in  the  Baye¬ 
sian  estimation  problem  is  to  estimate  the  signal  spectrum  by  spectral  subtraction  [7].  More 
sophisticated  Bayesian  methods  estimate  the  signal  model  along  with  the  signal  and  iterate 
between  filtering  steps  and  spectrum  estimation  steps  [8,  9].  With  these  methods  in  mind,  we  con¬ 
sider  several  modifications  to  our  Multichannel  RESA  method  when  a  good  initial  signal  estimate 
is  not  known.  We  try  letting  the  initial  signal  spectrum  estimate  be  infinite  or  fiat,  we  try  spec¬ 
tral  subtraction,  and  we  try  estimating  the  initial  signal  density  along  with  the  final  joint  signal 
and  noise  density.  Unfortunately,  none  of  these  approaches  gives  a  truly  convincing  solution  to 
the  problem,  and  so  the  issue  remains  open. 

n.  RELATIVE  ENTROPY 

The  Relative-Entropy  Principle  [lOj  can  be  characterized  in  the  following  way.  Let  «  be  a 
random  variable  with  values  drawn  from  a  set  v€D  with  probability  density  9^(*).  We  will 
assume  that  this  “true”  density  is  unknown,  and  that  all  we  have  available  is  an  initial  estimate 
p(*).  Now  suppose  we  obtain  some  information  about  the  actual  density  that  implies  that 
though  unknown,  must  be  an  element  of  some  convex  set  Q  of  densities.  Suppose  p^Q-  Since  Q 
may  contain  many  (possibly  infinitely  many)  difierent  probability  densities,  which  of  these  should 
be  chosen  as  the  best  estimate  q  of  q^7  And  how  should  the  initial  estimate  be  incorporated  into 
this  decision? 

The  Relative  Entropy  Principle  states  that  we  should  choose  this  final  density  9(«)  to  be 
the  one  that  minimizes  the  relative  entropy: 

H{q,p)  =  f  q(ft)\og  dw  (1) 

subject  to  the  condition  q  €Q.  It  has  been  shown  [10|  that  minimizing  any  function  other  than 
H(q ,  p)  to  estimate  q  must  either  give  the  same  answer  as  minimizing  relative  entropy  or  else 
must  contradict  one  of  four  axioms  that  any  “reasonable”  estimation  technique  must  satisfy. 
These  axioms  require,  for  example,  that  the  estimation  method  must  give  the  same  answer  regard¬ 
less  of  the  coordinate  system  chosen.  The  function  H{q ,  p)  has  a  number  of  useful  properties:  it 
is  convex  in  9 ,  it  is  convex  in  p ,  it  is  positive,  and  it  is  relatively  convenient  to  work  with  com¬ 
putationally.  If  the  convex  set  Q  is  closed  and  contains  some  q  with  H{q ,  p)  <  00,  then  there 
exists  a  q  €Q  that  minimizes  (1)  [llj.  This  solution  is  unique  up  to  a  set  of  measure  zero. 

Relative-entropy  minimization  was  introduced  as  a  general  method  of  statistical  inference 
by  Kullback  [12]  and  has  been  advocated  by  a  variety  of  authors  [13,  14, 15]  under  a  variety  of 
names,  including  cross-entropy  [16],  expected  weight  of  evidence  [17,  p.72],  directed  divergence 
[12,  p.7],  discrimination  information  [12,  p.37},  and  relative  entropy  [18,  p.l9].  The  principle  of 
Maximum  Entropy  [19,  20,  21]  is  a  special  case  of  the  Relative-Entropy  principle  [10,  22]  where 
the  initial  density  is  “flat”  over  the  domain  D. 

One  application  in  which  we  can  explicitly  state  the  form  of  the  relative-entropy  solution  q 
is  where  we  observe  the  expected  values  y*  of  a  finite  set  of  known  functions  y*  {• )  given  the 
actual  density  ).  Then  the  set  Q  of  possible  densities  is  defined  by  the  constraints: 


f  ?*(•)?(»)  dv  =  y* 
D 


for  k  =1,  .  .  .  ,  M 


In  addition,  the  density  y  (* )  must  be  properly  normalized: 

/  y  (•)</•=  1  (3) 

D 

Because  the  constraints  (2)  and  (3)  are  linear  in  y  ,  the  set  Q  of  2dl  probability  densities  satisfying 
these  constraints  must  be  convex.  If  the  y^  are  bounded  functions,  then  Q  is  closed,  and  there¬ 
fore  there  exists  a  density  y  that  minimizes  H[q ,  p)  subject  to  the  constraints  (provided  these 


are  compatible  with  H{q ,  p)  <  oo).  In  fact,  even  when  the  gi,  are  unbounded,  the  minimum* 
relative-entropy  density  q  can  be  shown  to  exist  under  fairly  general  conditions;  see  [11]  for  a 
statement  of  such  results. 

Given  the  constraints  (2)  and  (3),  we  wish  to  choose  the  final  estimate  9(*)  of  f^(*)  by 
minimizing  the  relative  entropy  (1)  subject  to  (2)  and  (3).  To  do  this,  we  introduce  Lagrange 
multipliers  \i, ,  construct  the  Lagrangian: 


f  q{9)dw  -  I  +  £)  X*  /?*(•)?(•)  <f»  -  5* 
D  S’-!  D 


and  set  the  variation  with  respect  to  9  to  zero.  We  obtidn: 


u  j 

<1  (• )  =  P  {• )  exp  I  -  Xo  -  5]  P*  (* )  J 


It  can  be  shown  that  if  there  is  a  solution  9  (*)  to  the  constrained  minimization  problem,  then  it 
must  have  the  form  (5)  with  the  possible  exception  of  a  set  of  points  on  which  the  constraints 
imply  that  q  vanishes  [12,  p.38;  ll],  Ck>nversely,  if  there  are  multipliers  such  that  9(e)  in  (5) 
satisfies  the  constraints  (2)  and  (3),  then  9  (* )  must  be  the  unique  element  of  Q  that  minimizes 
the  relative  entropy  subject  to  the  constraints  [11].  When  the  gi,  are  complex  functions,  (2)  is 
equivalent  to  two  real  constraints  for  each  k .  We  then  write  (5)  with  complex  Lagrange  multi¬ 
pliers,  define  complex  conjugate  quantities  g.r  =  tr*,  X_t  =  X/,  and  let  k  in  the  sum  range 
over  negative  as  well  as  positive  values.  In  general,  it  is  difficult  to  find  closed-form  solutions  for 
the  Xh  in  terms  of  the  constraints  •  Computational  methods  using  gradient  search  have  been 
developed,  however  [23]. 

m.  MULTICHANNEL  RELATIVE-ENTROPY  SPECTRUM  ANALYSIS 

Let  us  apply  this  theory  to  estimating  the  spectra  and  cross-spectra  of  a  set  of  L  signals,  or 
“channels”,  io(0>  •  •  •  >  which  we  collect  into  a  single  vector-valued  “multichannel”  sig¬ 
nal  *(<)  =  (  xo(t)  •  •  •  )  )^.  (In  what  follows,  we  will  use  italic  type,  such  as  z ,  P,  for 

scalars;  bold  italic,  such  as  a ,  for  column  vectors;  and  Roman,  such  as  P,  for  matrices.  Super¬ 
scripts  T  and  H  denote  the  transpose  and  the  Hermitian  adjoint,  and  a  star  denotes  the  complex 
conjugate.) 

We  assume  that  a(( )  is  a  bandlimited  stationary  random  complex  process.  To  simplify  the 
mathematics,  we  will  assume  that  a  (( )  is  a  finite  sum  of  complex  exponentials  at  frequencies  w. 
with  random  vector  amplitudes  e, ; 

.(0=  E  (6) 

n-O 

This  involves  no  essential  loss  of  generality,  since  an  arbitrary  stationary  complex  random  process 
may  be  approximated  by  the  form  (6)  with  arbitrarily  small  mean  square  error  on  arbitrarily  large 
finite  time  intervals  by  choosing  the  number  of  frequencies  large  enough  and  their  spacing  close 
enough  [24,  p.36]. 

Let  9^(eoj  •  •  ■  r  *w-i)  *  joint  probability  density  for  the  vector  amplitudes  e, .  We  can 

express  the  correlation  matrix  of  the  signal  as 


==  E  lim  -E 

r-.oo  T 


=  E  E  « 


H 


(expectation  with  respect  to  q  ^).  Fourier  transformation  gives  the  power  spectral  matrix 


3 


S(a,.)  =  E 

/  h! 

=  J  e»  e.  9^(eo,  •  •  •  ,  «w-i)  dca  '  dc^.y 

Let  US  choose  an  initial  probability  density  estimate  p  of  f  ^  such  that  the  e,  are  independent 
Gaussian  random  variables  with  zero  mean  and  covariance  P(ci;. ): 

N-\ 

P  («o,  •  •  •  ,  cs-i)  =  n  P  (e. )  (8) 

•  —0 


p(e.)  =  iV(0,  ?(«.))  = 


det  P(w, ) 


exp  (  -  e,"P(w,)-‘e,  ) 


This  choice  of  p  corresponds  to  choosing  the  initial  power  spectrum  estimate  of  S(a/, )  to  be 
PK): 

P(«. )  =  /  «.  e."p  (*0,  •  •  •  ,  «iv-i)  dco  •  •  •  des.i 

This  Gaussian  assumption  is  usually  considered  reasonable  and  is  often  implicit  in  spectrum- 
analysis  approaches  such  as  Blackman-Tukey  periodograms  [25]  or  estimation  procedures  such  as 
Wiener-Hopf  filters  [26].  For  further  discussion  of  the  assumed  form,  see  [27]. 

Now  suppose  we  leam  correlations  R),  at  various  lags  T),  of  various  pairs  of  linear  combina¬ 
tions  {t ),  (( )  of  the  vector  signal  components; 

=  E  lim  -L  /  (  a*”*(0  )  (  )*  dt 

'■  (9) 

This  rather  general  form  includes  measurements  of  correlations  of  pairs  of  single  signal  com¬ 
ponents,  t.e.  individual  matrix  elements  of  R(r|^ ).  As  ^mother  special  case,  treated  in  the  next  sec¬ 
tion,  it  includes  measurements  of  autocorrelations  of  the  sum  of  the  signal  components.  With  the 
help  of  (7),  this  gives  constraints  in  the  standard  form  of  (2)  as  follows; 


E  1  Pi>  •  •  •  .  ew-i)  deo  -  ■  ■  dcn-i 


The  Relative-Entropy  final  estimate  of  the  probability  density  of  the  e,  coefficient  given  the  ini¬ 
tial  estimate  (8)  and  constraints  (10)  is  then: 

q(eo,  ,  es-i)  =  p(eo.  •  •  •  .  os-i)  exp  |  -  Xq  -  E  X)  |  (11) 

for  some  set  of  Lagrange  multipliers  X;^ ,  which  are  chosen  so  that  q  {9)  satisfies  the  constraints 

and  is  normalized  (2).  (Again  we  use  the  device  of  setting  X_*  =  X**  and  letting  k  in  the  sum 
run  over  negative  values  as  well  as  positive.  With  the  definitions  r.*  =  -r* ,  at.*  =  /S* , 
P-k  =  «* .  this  ensures  a  real  result.)  Substituting  the  formula  (8)  for  p  (co.  •  •  •  .  «w-i)  (H) 

and  simplifying  puts  the  probability  density  estimate  into  the  following  elegant  form: 


where 


q(eo,  .  .  .  ,  Cat-i)  =  H  ?(*») 

•  -0 


,(*.)  =  iV(0,  Q(w.)) 


■  m  A  A  *^c  »■.  ■>■:  r;  ■ 


■r,  v."’.".'  ■.'  ••■_  L,"_  ',r.  T",'  r.  'J-.  r 


Q(w. )  = 


k 


1-1 


and  where  the  unknowns  X*  must  be  determined  from  the  constraints.  Substituting  this  probabil¬ 
ity  density  into  (10)  and  simplifying  reduces  the  constraints  to  the  form: 


i?*  =  a*" 


EQK)*' 


Pk 


(13) 


Adjusting  the  X^  imtil  the  latter  equations  are  satisfied  with  Q(a;. )  >  0  is  a  non-linear  problem 
that  must  be  solved,  in  general,  by  a  non-linear  gradient  search  technique. 

The  amplitudes  e„  are  a  posteriori  independent  Gaussian  random  variables  (i.e.  have 
independent  Gaussian  final  densities).  Even  if  the  channeb  of  a(()  are  a  priori  independent  (i.e. 
have  independent  initial  densities),  so  that  the  P(w, )  matrices  are  all  diagonal,  the  observation 
information  concerns  linear  combinations  of  the  channels,  and  as  a  result  the  covariance  of  the 
final  density,  Q(cj,i ),  will  generadly  not  be  diagonal.  Thus  the  final  estimates  of  the  various  chan¬ 
nels,  unlike  those  in  [5],  will  generally  be  correlated  with  each  other. 


IV.  SPECTRUM  ESTIMATION  FROM  CORRELATIONS  OF  SIGNAL  PLUS  NOISE 


A  special  case  of  great  practical  interest  is  that  in  which  we  observe  autocorrelations  only 
for  the  sum  of  the  signal  components 

»(0=  S*.(0=  *’*‘*(0  (14) 

•  ■•o 

where  e  =  (  1  1  •  •  •  1  )^.  We  then  have: 


=*^EQK)« 

11 


(15) 


These  constraints  are  identical  in  form  to  those  in  (13)  with  =  fii,  =  e  for  all  k .  We  may 
often  take  the  signal  components  s,-  (t )  to  be  a  priori  uncorrelated,  so  that  the  power  spectral  den¬ 
sity  matrix  P(w, )  is  diagonal  for  all  w, .  This  restriction,  however,  is  not  necessary. 


by: 


The  Multichannel  Relative-Entropy  Spectrum  Analysis  estimate  for  x(t)  from  (12)  is  given 


Q(wi. )  = 


p(«.r‘  + « 


(16) 


where  the  Lagrange  multipliers  X*  are  chosen  to  satisfy  (15).  The  structure  of  this  estimate  is 
quite  similar  to  the  single-channel  Relative-Entropy  Spectrum  Analysis  (RESA)  estimate  given  by 
[Ij  except  that  the  quantities  involved  are  matrices.  Namely,  the  second  term  inside  the  brackets 
is  the  product  of  a  scalar  E,  the  summation,  with  e  <^,  a  square  matrix  of  2dl  I’s.  In  the  single¬ 
signal  case,  P(<*», )  and  Q(w,  )  become  scalars,  and  we  can  replace  c  E  with  E;  the  result  is 
just  the  RESA  estimate.  On  the  other  hand,  there  is  also  a  close  formal  connection  with  the  Mul¬ 
tisignal  RESA  estimate  given  in  [5j.  That  is  equivalent  to  the  result  of  replacing  e  E  in  (16) 
by  El,  where  I  is  an  identity  matrix. 


The  expression  (16)  can  be  put  into  another  interesting  form  by  using  the  Woodbury- 
Sherman  formula  (A  +5Z?C)"*  =  A'‘  -  A~^B(CA~'B +D~^)~^CA~^: 

X  D/  ^  PK)«  «’'P(w.) 

Q(w. )  =  P(w. ) - ; - 


*  P(w,)«  + 


E  * 

k 


(17) 


Defining  initial  and  final  power-spectrum  estimates  for  the  summed  signal  y  (t )  by 


5 


^w(w.)  =  «'^P(w,)« 


we  obtain  from  (17): 


Qtr  (***• )  ) 


PnM  + 


E\  .’“•’’t 

A*  e 


and  thus 


QnM  = 


This  is  precisely  the  form  of  the  single-signal  RESA  final  estimate  with  initial  estimate  P„  (o', ). 
We  can  write  (15)  as 

^*  =  E  (20) 

The  Lagrange  multipliers  in  (19)  must  be  chosen  to  make  (cj.  )  satisfy  the  correlation  con¬ 
straints  (20).  We  can  thus  determine  \i,  in  (IS)  by  solving  a  single-channel  problem.  That  pro¬ 
vides  everything  necessary  to  determine  the  solution  Q(wa)  of  the  multichannel  problem.  We  can 
in  fact  express  Q(w,  )  directly  in  terms  of  )  and  P(w, );  from  (17)  and  (18)  we  obtain 

Q(c.,  )  =  P(«. )  +  -  — P(«.  )*  *'"P(<^. )  (21) 

These  equations  summarize  the  Multichannel  Relative-Entropy  Spectrum  Analysis  method  for 
correlations  of  a  sum  of  signals.  The  calculation  of  the  final  spectral  matrix  proceeds  in  two 
steps.  First  we  must  find  Lagrange  multipliers  such  that  the  final  estimate  of  the  power  spectrum 
of  the  sum  of  signals,  (w, )  in  (19),  has  the  observed  correlation  values  (20).  Computationally, 
this  generally  requires  a  nonlinear  gradient  search  algorithm  to  locate  the  correct  [23] .  Next 
the  final  spectral  density  matrix,  Q((a;,  )  in  (21),  containing  the  cross-spectra  as  well  as  the  power 
spectra  of  the  individual  signab,  is  formed  by  combining  a  linear  multiple  of  the  fitted  power 
spectrum  Q„  (w, )  with  a  constant  term  that  depends  only  on  the  initial  densities. 

Frequently  the  multichannel  signal  x(0  will  comprise  just  two  components,  a  signal  s(t) 
and  an  additive  disturbance  d{t): 


-E  (I"). 

The  initial  estimate  takes  the  form 

^("•)=Ip.(w,)  P.(u,.))=4U, 

The  expression  for  P„  (w,  )  specializes  to: 

P„(a;.)=  (11)  P(w.)  (  J)  =e[  |a.  +6,  j^] 

We  also  define  the  initial  cross-power  spectra  of  «  (( )  and  d(t)  with  respect  to  y(t)  as  follows: 

(  Pn(w.)  )  I  1  ) 

I  PifM  )  I  1  I 


(<r:  C) 
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( )  =  ( 1  1 )  P(w.) 

We  define  the  components  (w, ),  <2^  (w, ),  <5ii  (<**» ),  ).  Qni^m )  aod  )  simUarly. 

Then  (21)  becomes: 

QK )  -  PK )  +  (  pI;}!;  ! )  (  P,  K )  P«  K ) )  P») 

An  alternative  formula  for  Q(w, )  in  terms  of  Q„  (u. )  is: 

PnM 

r^,..  ^  f  Pf.K)  P,iM  \  ,  detPlw.)  (Mr,  ,  ^  (0A\ 

(1-1)  (24) 

P«(w.) 


V.  INTERPRETATION  OF  MULTICHANNEL  RESA 


The  formulas  defining  Multichannel  RESA  have  an  interesting  and  profound  structure  that 
may  not  be  obvious  at  first  glance.  First  of  all,  formula  (21)  makes  it  easy  to  state  conditions 
under  which  the  matrix  estimate  Q(c>;a )  is  positive  definite.  Next,  the  appearance  of  Q„  (w, )  in 
the  constraint  equation  for  the  is  actually  something  we  should  have  expected  by  the  property 
of  subset  aggregation  that  Relative-Entropy  estimators  satisfy  [28].  Furthermore,  formula  (24), 
which  builds  the  spectral  density  matrix  estimate  Q(c<;. )  by  linearly  filtering  the  fitted  model 
spectrum  Q„  (cj,  ),  is  identical  in  form  to  the  standard  Bayesian  formula  for  the  final  expected 
power  and  cross-power  in  two  signals  given  the  value  of  their  sum.  In  particular,  the  first  term  in 
(24)  applies  the  well-known  Wiener-Hopf  smoothing  filter  [26]  to  Q„  (w, ),  while  the  second  term 
can  be  interpreted  as  the  expected  final  variance  of  <r,  and  .  Finally,  we  will  show  that  the 
relative  entropy  H{q,p)  has  the  same  form  as  a  generalized  Itakura-Saito  distortion  measure 
[29] .  Thus  minimizing  relative  entropy  in  this  problem  is  equivalent  to  finding  the  spectral  matrix 
Q(^»  )  'vith  minimum  Itakura-Saito  distortion. 

A.  Positive  Definiteness 

Assume  that  the  initial  spectral  density  matrices  are  positive  definite,  P(w,  )>0;  then 
P^(u;,  )>0  also  for  all  ui,  .  This  implies  that  Q(w,  )  in  (21)  is  at  least  well-defined,  provided  we 
can  find  some  (w,  )  that  satisfies  the  correlation  constraints.  Assume  moreover  that  Q„  (w,  ) 
is  strictly  positive,  (2^(w,  )>0.  Let  «  be  any  nonzero  vector.  Since  P(w,  )  is  positive  definite, 
we  can  write  «=««+•  for  some  scalar  a  and  some  vector  «  such  that  »**P(w,  )e  =0. 
Then  (21)  implies  ■”Q(u;,  )•  =  |  o  |  ^<3„(w, )  +  »**P(w,  )•,  and  at  least  one  of  the  two  terms 
on  the  right-hand  side  must  be  positive.  Thus  *^Q(w, )«  >  0  for  every  nonzero  vector  «  ;  that 
is,  Q(w, )  is  strictly  positive  definite,  Q(w,  )>0. 

B.  Subset  Aggregation  Property 

Consider  the  following  two  approaches  to  estimating  the  final  density  of  the  frequency  com¬ 
ponents  =  <T,  -1-5,  of  y(t)  =  s  (l  )-f-d  (t ),  or  equivalently,  estimating  the  final  power  spectrum 
Qn  (w. ): 

Method  #1: 

a)  Apply  Relative  Entropy  to  estimate  the  joint  final  density  of  a,  ,  5,  : 


5.  )  ) 
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b)  Form  the  final  density  q  (i;, )  for  17,  =  from  q 


9  (»?. )  = 


Method  #2: 

a)  Form  an  initial  probability  density  estimate  p  )  from  p 

?(»?.)  =  n(o.  (11)  P(w.)  (  J]  I  =  jV(  0,P„(u;.)  ) 

b)  Solve  the  Relative-Entropy  problem  to  find  the  final  density  of  q  (f7,  )  given  the  con¬ 
straints  on  the  correlations  of  y{t)  and  the  initial  density  p  (r/,  ). 


Method  #1  is  exactly  the  approach  we  have  taken  in  this  paper— the  resulting  spectrum  estimates 
~  (  1  ^  (11)^  given  by  equations  (20)  and  (19).  According  to  the  theory 

of  Relative  Entropy,  both  methods  for  estimating  q  {ti» )  ought  to  give  identical  results.  In  fact,  if 
we  follow  method  #2  and  apply  the  Relative-Entropy  result  in  (16)  to  the  “single-channel”  signal 
y  (t ),  we  will  find  the  following  final  density  estimate  q  for  y  (t ): 

9  (»?o.  •  •  • .  ns-i)  =  n  9  (n. ) 

•  —0 


where 


9{»/.)  =  N{0  ,  Q„{u,)) 


1 

Pn  ) 


u 

+  S  * 

ik— I 


-1 


and  the  Lagrange  multipliers  X/,  are  chosen  so  that  the  correlations  of  Q„  (u;, )  have  the  correct 
values: 


As  expected,  these  formulas  for  Q„(wg )  are  identical  to  those  for  Q„(w, )  in  (19)  and  (20),  and 
thus  the  two  methods  do  indeed  give  identical  results. 


C.  Bayesian  Filtering  Interpretation 


Formula  (24)  appears  complicated,  but  it  is  in  fact  precisely  the  same  form  as  the  result 
found  by  a  purely  Bayesian  approach  to  the  problem.  Consider  the  following  situation.  Suppose 
we  are  given,  not  the  correlations  R*  ,  but  the  actual  exact  frequency  components  rj,  of  y  (/ ). 
What  b  the  final  density  of  <x»  and  6„  given  these  values  =  er,  +5,  ?  We  will  take  a  purely 
Bayesian  approach.  Because  (T,  and  6,  are  a  priori  jointly  Gaussian,  so  are  <t,  and  r?,  ,  their 
joint  density  is 


=  0  , 


^w(w.)  I 


(25) 


Now  given  17,  ,  we  can  find  the  conditional  density  of  given  >7,  by  the  standard  Bayesian  for¬ 
mula: 
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(26) 


p  (<T.  I  *?. )  ==  p**!*^*! »?. ,  p«  (w. )  - 

^nr  J 


(w,  )P^  (w,  ) 


Let  us  define  the  Bayesian  estimate  Qb  (w,  )  of  the  power  spectral  density  matrix  of  (  «t,  6,  )‘^ 
given  Jj,  as  follows; 


Qfl  (w.  )  =  E  (  6^  I  ^  ^  ^  *?• 


But  then: 


Qs  )  =  E 


(  <7,  fj,  -<7,  )  * 


“  ^  [  U.^,  I  'J-  J  E  [  (  ^.  )  I  r,.  ]  +  Var  I 

Substituting  the  final  mean  and  variance  of  <7,  given  in  (26)  into  (27)  and  recognizing  that: 

1  ^  ( /’..K)  ) 

~P„  (w. )  P„  (w. )  I  P,.  (w. )  P„  (w, )  J 


p..p.)p„;-^.)-p.,(w,)p^(w.) 


Pfi^n)  Pn(‘^*) 


PnM 


det  P(w, ) 


P».  )  Pji  (<^n  ) 

Pw  (‘^i. )  Prf  ) 


(1-1) 


det  P(u;. ) 

P»M 


PnM 

n  r,  i_  12  PjiK)  )  ,  f  1  1  r  ,  ,  ^  det  P(w. ) 

«>(“■!- I’- 1  I  t;7(o  7;>:t  r  I -1 M ' I -?;;Rr 

I  P^K)  I 

This  Bayesian  spectral  density  estimate,  however,  is  identical  in  form  to  the  Multichannel 
Relative-Entropy  Spectrum  Analysis  estimate  Q(w„ )  in  (24),  except  that  the  Bayesian  method 
uses  the  known  signal  plus  noise  spectral  power  [7,  |  ^  in  (28)  while  the  Multichannel  RESA  esti¬ 
mate  must  use  the  smooth  fitted  signal  plus  noise  power  spectrum  estimate  Qy,  (w» ).  This  simi¬ 
larity  between  the  two  methods  provides  an  interesting  interpretation  of  the  Multichannel  RESA 
formula.  The  first  term  in  (24)  is  the  product  of  the  expectation  of  ( cr,  5,  )^  and  the  expecta¬ 
tion  of  (  a*  6*  ).  The  vector  elements  P„  (w,  )/Py,  (w,  )  and  P,*,  (w,  )/Pyy  (w, )  are  exactly  the 
smoothing  Wiener-Hopf  filter  expressions  that  arise  when  estimating  a  signal  (  o-,  6„  from  an 
observation  7, .  The  second  term  in  the  Multichannel  RESA  estimate  depends  only  on  the  prior 
information  and  corresponds  to  the  a  posteriori  covariance  of  the  spectrum  estimates.  The  term: 

det  P(w, ) 

P»M 

thus  can  be  used  as  a  crude  estimate  of  the  variance  of  our  model  at  . 

D,  Asymptotic  Behavior 

It  is  interesting  to  examine  how  the  Multichannel  Relative-Entropy  algorithm  performs  in 
the  asymptotic  limit  as  the  true  correlations  of  the  signal  plus  noise  are  known  at  all  possible  lags. 
In  this  asymptotic  limit,  the  only  solution  <?yy(w, )  that  can  possibly  satisfy  all  the  correlation 
constraints  is  the  true  power  spectrum  of  the  signal  plus  noise.  Unfortunately,  the  allocation  of 
this  power  between  the  signal  and  noise  spectral  estimates  and  cross-spectrum  estimates  is  deter¬ 
mined  entirely  by  the  initial  estimates  of  these  spectra  via  the  formula  (24).  Therefore,  beyond  a 
certain  limit,  gathering  more  and  more  correlation  values  will  only  improve  the  estimate  of 


■■••---j 


(^a )  not  improve  the  estimates  of  the  signal  or  noise  power  spectra  or  cross-power 

spectra. 

E.  Generalized  Itakura-Saito  Dietortion  Meatnre 


If  we  substitute  any  zero-mean,  Gaussian  densities 

?(«.)-N(0,Q(w.)) 

and 

p(e,)  =  N(  0  ,  P{w,)  ) 
into  the  relative-entropy  formula  we  get: 

>  P  )  =  E  ‘r  I  Q(w.  )P(w,  )■*  -  I  I  -  log  det  (  Q(a;.  )P(w,  )-' ) 

This  is  just  a  generalized  version  of  the  Itakura-Saito  distortion  measure  [29] .  We  therefore  could 
have  derived  the  same  spectrum  estimate  by  minimizing  the  Itakura-Saito  distortion  measure  over 
all  possible  spectral  matrices  Q(c>/a )  subject  to  the  constraints  (13). 

VI.  COMPUTATIONAL  CONSIDERATIONS 


The  difficult  step  in  the  Multichannel  RESA  procedure  is  to  solve  for  the  Lagrange  multi¬ 
pliers  that  will  give  Q„  (oj. )  the  appropriate  correlations  in  (20).  Gradient  search  algorithms  for 
computing  this  in  general  are  given  by  Johnson  [23|.  Once  )  is  known,  the  components  of 

Q(wa )  may  be  easily  found  by  filtering  Q„  (cj, )  and  adding  in  the  final  covariance  estimate,  an 
amount  of  computation  that  is  linear  in  the  number  of  frequency  samples. 


One  special  case  is  particularly  easy  to  solve.  This  is  when  correlations  of  y{t)  are  given 
for  uniformly  spaced  lags  r*  ==  -p  ,-p  -t-l,  .  .  .  ,  p  and  when  the  initial  spectral  density  of  the  sig¬ 
nal  plus  noise,  («, )  =  (  1  1  )  P(w, )  (  1  1  )^  is  autoregressive  of  order  at  most  p  .  Let  us 

take  the  limiting  form  of  our  equations  for  equispaced  frequencies  as  the  spacing  becomes 
extremely  small,  so  that  we  can  treat  the  spectral  densities  as  continuous  functions  of  w.  Then 
because  P„  (w, )  is  autoregressive  (all-pole),  the  term  l/P^{uj, )  in  the  denominator  of 


QnH  = 


1 


+  E  x*c-"‘ 

*— r 


has  the  same  form  as  the  sum  over  k  .  We  can  therefore  combine  coefficients  in  the  two  sums  and 
write 


Qn(w)  =  -7— i - 

E 

*— f 

where  ,  .  .  .  ,  0^  are  to  be  determined  so  that: 

This,  however,  is  the  standard  Maximum  Entropy  Spectrum  Analysis  problem  (MESA),  and  can 
be  solved  with  0(p®)  calculation  by  Levinson  recursion  [30,  31,  32].  Thus  in  this  special  case,  the 
power  spectrum  of  the  signal  plus  noise  Q„  (w)  will  be  set  to  the  MESA  estimate.  The  initial  esti¬ 
mate  Pff  (w)  will  be  completely  ignored  in  this  step.  (This  is  an  example  of  “prior  washout”,  as 
discussed  by  Shore  and  Johnson  [28].  )  The  spectral  density  matrix  estimate  Q(w)  for  s  (t )  and 
d(t)  will  then  be  formed  as  the  appropriately  filtered  function  of  this  MEISA  spectrum.  In  this 
case,  therefore,  the  prior  information  is  used  solely  to  control  how  the  estimates  for  the  signal  and 


•ft  ^ 


the  disturbance  are  to  be  obtained  from  the  ME^SA  estimate.  Note  that  although  P„(u)  and 
will  both  be  autoregressive  spectra,  the  individual  components  of  the  spectral  density 
matrices  P(w)  and  Q(w)  will  generally  not  be  autoregressive. 

Vn.  UNKNOWN  INITIAL  SIGNAL  POWER  SPECTRUM  ESTIMATE 

One  potential  difficulty  in  applying  our  Multichannel  RESA  algorithm  is  that,  like  the  Baye¬ 
sian  algorithm,  it  requires  good  prior  knowledge  of  both  the  signal  and  the  noise  spectra.  Unfor¬ 
tunately,  in  some  applications  such  as  speech  enhancement,  good  prior  knowledge  may  be  avail¬ 
able  only  about  the  noise  spectrum.  In  such  cases  it  will  be  necessary  to  choose  an  initial  esti¬ 
mate  of  P„  (u, )  by  some  rule. 

In  [6],  the  method  of  [5]  is  extended  to  allow  for  the  possibility  that  not  all  initial  estimates 
are  equally  reliable.  It  is  shown  how  to  associate  weights  with  initial  estimates.  When  there  is  a 
good  initial  noise  estimate  but  a  poor  initial  speech  estimate,  one  gives  a  large  weight  to  the  ini¬ 
tial  noise  estimate  and  a  small  weight  to  the  initial  speech  estimate.  The  result  is  that  the  final 
noise  estimate  tends  to  stay  close  to  the  initial  estimate,  while  the  final  speech  estimate  does  most 
of  the  varying  to  conform  to  the  constraints.  An  analogous  method  for  incorporating  weights  in 
our  present  procedure  would  be  desirable,  but  we  do  not  yet  have  such  a  method.  Straightfor¬ 
ward  imitation  of  the  development  in  [6],  but  without  the  assumption  of  independent  final  esti¬ 
mates,  yields  final  estimates  that  do  not  depend  on  the  weights  introduced.  Accordingly  we 
explore  several  other  approaches. 

The  one  that  probably  best  fits  the  general  philosophy  of  Multichannel  RESA  is  to  assume 
that  the  signal  and  noise  are  a  priori  independent,  and  that  the  initial  signal  density 
p  ((7oi  •  ■  •  >  u  “fiat”  over  the  domain  of  interest,  or  equivalently,  that  the  signal  variance  is 
very  large.  Substituting  this  fiat  Initial  signal  density  times  a  zero-mean  Gaussian  initial  noise 
density  into  the  relative  entropy  formula  (1),  and  then  solving  for  the  resulting  spectral  estimate 
Q{w, )  yields: 


Q("« )  = 


0 


)  (>  1) 


This  is  the  same  solution  as  (16)  but  with  P,4  (w,  )=0  and  with  P„  (w, )  set  to  infinity.  Inverting 
the  spectral  matrix  and  using  the  same  algebraic  manipulations  as  before  gives  the  following 
equivalent  formula  for  ); 

B.  -  S 


EX.''"-" 

k 


Qn  ) 

-PuM  PuM 


This,  unfortunately,  is  not  quite  the  answer  one  might  have  expected.  We  first  fit  a  smooth 
autoregressive  spectrum  to  the  correlations  of  the  signal  plus  noise;  this  is  exactly  the 

MESA  spectrum  estimate,  and  it  ignores  the  prior  information  about  the  noise.  This  spectrum  is 
then  allocated  between  the  signal  and  noise  in  a  rather  peculiar  way:  the  signal  spectrum  is 
estimated  by  adding  the  noise  spectrum  to  the  smooth  estimate  of  the  signal  plus  noise.  Also  the 
final  noise  power  spectrum  is  estimated  as  exactly  equal  to  the  initial  nobe  spectrum  Pu  ); 
none  of  the  observed  data  affects  this  estimate  at  all. 


Iv:-:-) 
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In  retrospect,  this  structure  is  not  terribly  surprising.  We  started  by  assuming  infinite  signal 
variance  at  all  frequencies.  The  observed  correlation  data  is  then  taken  into  account  by  striking  a 
compromise  between  the  fitted  noisy  power  spectrum  Q„  (w, )  and  the  prior  belief  that 
Pm  (^a )  =  oo.  This  compromise,  inevitably,  is  largtr  than  the  fitted  noisy  signal  spectrum. 
Because  virtually  nothing  is  known  about  the  signal,  any  deviation  of  )  from  the  noisy 

spectrum  Pu  (w. )  is  considered  to  be  due  to  the  signal,  and  thus  the  initial  noise  spectrum  esti¬ 
mate  is  accepted  as  truth. 

Thus,  although  setting  P„  (w, )  —  oo  may  be  the  philosophically  correct  approach  when  no 
information  about  the  signal  is  given,  it  leads  to  results  that  are  counterintuitive  and  not  very 
useful.  The  problem  is  that  we  generally  know  something  about  the  signal,  if  only  that  it  has 
finite  energy.  This  might  lead  us  to  consider  choosing  as  the  initial  signal  density  estimate  a  den¬ 
sity  that  has  maximum  entropy  subject  to  a  constraint  on  the  total  signal  energy.  This 
corresponds  to  an  initial  signal  power  spectrum  estimate  that  is  flat  across  all  frequencies,  with 
constant  value  Pq-  I'he  problem  is  picking  a  good  choice  for  Pq.  Choose  Pq  too  small  and  most 
of  the  fitted  spectrum  Q„  (u, )  will  be  allocated  to  the  noise;  choose  it  too  large,  and  most  of 
Q„  (u, )  will  be  allocated  to  the  signal.  Pq  thus  becomes  an  experimental  parameter  that  must 
be  varied  to  achieve  a  desirable  eflect. 

Another,  more  ai  hoe  idea  is  to  try  estimating  P^  (w, )  by  spectral  subtraction.  Suppose  we 
are  given  the  observed  data  waveform  y(t)  with  complex  exponential  amplitudes  .  To  obtain 
the  initial  signal  power  spectrum  estimate,  subtract  the  estimated  nobe  power  from  the  observed 
periodogram  and  clip  the  result  to  positive  values: 

=  U-  )  (29) 

Then  calculate  a  few  low  order  correlations  from  3r(()>  combine  them  with  thb  initial  esti¬ 
mate  to  estimate  Q((Ja  )•  Unfortunately,  unless  |  <;•  |  ^  dips  below  Pu  (w. )  over  significantly 
large  numbers  of  frequency  samples,  it  b  easy  to  see  that  our  estimation  procedure  wiU  approxi¬ 
mately  set: 

<?wK)  *=*  ^wK)  =  “«(  l»l.  I*.  ^«(«.)  ) 

P,.M  0  I 
0  P«(w»)  ] 

Thb  spectrum  estimate  unfortunately  differs  little  from  the  spectral  subtraction  estimate,  and  it 
has  the  large  variance  of  the  periodogram,  a  situation  that  Multichannel  RESA  was  supposed  to 
cure. 

We  consider  one  final  approach,  which  seems  too  reasonable  to  resbt.  Since  the  initial  sig¬ 
nal  density  b  unknown,  one  might  argue,  perhaps  it  should  be  estimated  along  with  the  final  den¬ 
sity  q .  Thb  turns  out  to  be  a  rather  bad  idea,  but  b  interesting  enough  to  pursue  for  a  while. 
Let  us  assume  that  the  signal  and  nobe  are  a  priori  independent,  so  that: 

p(<r,S)  =  p,{a)p4(S) 

We  will  assume  that  the  nobe  density  has  the  form  in 

P W  =  ri  P(^») 

•  -0 

p(«.)  =  lV(0,P„(w.)) 

but  that  the  initial  signal  density  p,  (v)  b  to  be  estimated  along  with  the  final  joint  density  q  (*r,S) 
by  minimizing  the  relative  entropy  over  both  q  and  p,  subject  to  the  correlation  constraints  on 
q  .  We  write  this  as: 


(31) 

(32) 


Q(w,)«P(w.) 


(30) 


9.P, 


(33) 

where  Q  is  the  set  of  densities  satisfying  the  correlation  constraints  (15).  Let  us  solve  this  minim¬ 
ization  iteratively,  minimizing  Srst  over  all  q  €Q,  then  over  all  p  ,  iterating  back  and  forth  until 
the  estimates  converge: 

For  k=0,l,  •  •  • 

9*  +  J  —  mm  /f  (« ,  Pi )  (34) 

p,  ^  min  /f  (to+i,  p.  Pi ) 

p. 

The  first  problem  is  precisely  our  usual  Multichannel  RESA  algorithm  for  estimating  the  final 
density  q  (Wji)  from  the  initial  density  p,^  (*p)Pi  (^)  and  the  correlations  i?* .  To  solve  the  second 
step,  factor  q  (a,tf)  =  q  {o)q  (S  |  a)  and  rewrite  H{qi,  ,p,  Pi )  in  the  form: 


^{Qk+uP,  Pi)  =  f  9*+i(»)  log  dt 

P*  W 


+  //  log  — dS  da 

PdW 

Since  p,{a)  only  spears  in  the  first  term,  and  since  this  first  term  is  bounded  below  by  0,  the 
minimum  of  H{qi,+i,  p,  Pi )  over  p,  is  achieved  by: 

=  9*+iW  =  /  9*+i(‘^.*)  da  (36) 

Putting  all  this  together  implies  that  q(a,S)  will  generally  be  Gaussian  with  the  same  form  as 
before,  and  that  Pn^,^(^)  wiU  also  be  Gaussian  with  covariance  P„  (w, )  =  Q„  (w, ).  The  com¬ 
plete  iterative  algorithm  then  takes  the  following  form: 

Guess  F„  («, ) 

For  k  —0,1,  •  •  ■ 

a)  Use  the  Multichannel  RESA  algorithm  to  compute  Q(w, ) 

b)  Set  P„  (w, )  =  Q„  (w, ) 

Thus  we  run  our  usual  Multichannel  RESA  algorithm  to  get  a  good  estimate  of  our  signal  and 
noise  spectra.  We  then  set  our  initial  signal  spectrum  estimate  to  this  improved  final  signal  spec¬ 
trum  and  iterate.  The  improved  initial  density  should  lead  to  an  even  better  final  density.  Each 
iteration  drives  the  relative  entropy  to  ever  smaller  values,  and  thus  each  set  of  spectrum  esti¬ 
mates  ought  to  be  better  than  the  last. 

Alas,  appearances  can  be  deceiving.  This  idea  of  feeding  the  Multichannel  RESA  process 
back  on  itself,  using  the  final  estimates  as  a  new  initial  estimate  for  the  next  iteration,  has  been 
informally  suggested  by  numerous  researchers,  but  in  this  case,  where  the  procedure  converges 
depends  entirely  on  where  it  starts.  The  problem  is  that  the  original  double  minimization  prob¬ 
lem  (33)  may  have  infinitely  many  solutions.  For  example,  suppose  there  is  at  least  one  spectrum 
P„(w, )  that  is  larger  than  the  initial  noise  spectrum  estimate,  )>Pm(w.  ),  and  has  the 

correct  correlations  .  (There  may  be  infinitely  many  such  densities.)  Let 

(w, ).  Then  the  corresponding  initial  density  p,  (a)pi  (S)  will  satisfy  the 
constraints  (15),  and  thus  the  Relative-Entropy  final  estimate  will  just  be  q  (a,S)  =  p,  {a)pi  ( £) 
and  the  Relative  Entropy  at  this  solution  will  be  H{q  ,p,Pi)  —  0,  which  is  as  low  as  it  will  ever 
get.  Because  there  could  be  an  infinite  number  of  choices  for  P^  (w, ),  this  relative  entropy  prob¬ 
lem  could  have  an  infinite  number  of  minimizing  solutions.  Which  solution  our  iterative  algo¬ 
rithm  converges  to  will  thus  depend  on  where  we  start.  It  could  be  argued  that  this  iteration  is  so 
appealing  that  perhaps  we  should  use  it  anyway,  but  stop  after  only  a  couple  of  iterations.  Unfor¬ 
tunately,  this  i^>proach  is  difficult  to  justify  on  anything  but  empirical  grounds. 
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Vin.  IMPROVING  THE  ESTIMATES  WITH  ADDITIONAL  KNOWLEDGE 


The  spectral  estimates  gained  from  the  Multichannel  RESA  algorithm  can  be  significantly 
improved  if  additional  knowledge  about  the  process  can  be  incorporated.  We  will  discuss  two 
particular  types  of  knowledge  which  improve  the  spectral  estimates  substantially  for  the  case 
where  we  observe  correlations  of  signal  plus  noise. 

Suppose  that  we  observe  the  correlations  Ri,  of  a  signal  plus  noise,  y{t)  —  s  (t)  +  d(t),  as 
in  section  IV.  Suppose,  however,  that  we  luld  the  constraint  that  the  signal  and  noise  are  known 
to  be  independent,  so  that  the  densities  must  have  the  form: 

P  {<^o»  •  •  •  >  <^N-u^o>  •  •  •  I  =  P  •  •  • »  P  (^»  •  •  •  f  ^AT-l)  (37) 


and: 


q{<To,  .  .  .  ,  .  .  .  ,  Sff.i)  =  q(<To,  .  .  .  ,  q(^,  ■  ■  ■  , 


(38) 


If  we  solve  the  Minimum  Relative  Entropy  problem  with  this  additional  restriction  on  the  form  of 
q ,  then  we  will  find  that  the  spectral  estimate  Q((<;,, )  is  now  a  diagonal  matrix  with  elements 
Q,d  (<*'. )  =  Qii  (w. )  =  0.  anfi: 

<3«(w.)^ - r  ^ 


Qdd  (W»  )  = 


1 


(39) 


-p  )  — r  +  E  * 


The  Lagrange  multipliers  are  chosen  so  that  the  sum  of  these  spectra, 

(<*>% )  —  Qu  (‘^n )  +  Qu  (<*^ii )  has  the  correct  correlations.  These  formulas  are  precisely  the 
results  given  by  Johnson  and  Shore  [S|. 

In  addition  to  the  independence  constraint,  in  some  cases  we  may  believe  that  our  noise 
model  is  quite  accurate,  and  may  wish  to  constrain  the  noise  density  to  exactly  match  our  a  priori 
noise  model: 

q[6o,  •  •  . ,  6n-i)  =  p{So,  •  .  ■ ,  6n.i)  (40) 


With  this  additional  constraint,  the  Minimum  Relative  Entropy  solution 
Qdd  (w« )  =  Pu  (<*'n  ).  Q,4  (w,  )  =  (w,  )  =  0,  and: 


Q«(w.)  = 


*  I  \ 

p  /  "T 


now 


gives 


(41) 


where  the  Lagrange  multipliers  are  again  chosen  so  that  Q„  (w, )  =  Q„  (w, )  +  (cj,  )  has 
the  correct  correlations.  Since  (w, )  is  known  to  equal  (w, ),  however,  this  is  equivalent  to 
choosing  the  so  that: 

E  <3m(u».)  =  fl*  -  E  (42) 

•  • 

The  right  hand  side  is  just  the  observed  correlations  minus  the  contribution  due  to  the  noise,  and 
is  therefore  an  estimate  of  the  signal  correlations.  With  the  added  assumption  of  independence 
and  known  noise  spectrum,  therefore,  our  Multichannel  RE)SA  method  acts  like  a  correlator- 
subtracter  technique  [7],  subtracting  the  noise  correlations  from  the  observed  correlations,  and 
then  fitting  a  standard  RESA  model  to  these  estimated  signal  correlations. 


DC.  EXAMPLE 


We  define  a  pair  of  spectra,  5^  and  S„  ,  which  we  think  of  as  a  known  “background”  and 
an  unknown  “signal”  component  of  a  total  spectrum.  Both  are  symmetric  and  defined  in  the  fre¬ 
quency  band  from  -v  to  n*,  though  we  plot  only  their  positive-frequency  parts.  (The  abscissas  in 
the  figures  are  the  frequency  in  Hz,  u/2ir,  ranging  from  0  to  O.S.)  is  the  sum  of  white  noise 
with  total  power  5  and  a  peak  at  frequency  0.215  X2ff'  corresponding  to  a  single  sinusoid  with 
total  power  2.  S„  consists  of  a  peak  at  frequency  0.165  X2n‘  corresponding  to  a  sinusoid  of  total 
power  2.  Figure  1  shows  a  discrete-frequency  approximation  to  the  sum  S„  +  Su ,  using  100 
equispaced  frequencies.  From  the  sum,  six  autocorrelations  were  computed  exactly.  Su  itself 
was  used  as  the  initial  estimate  of  5^ .  For  we  used  a  uniform  (flat)  spectrum  with  the 
same  total  power  as  P^ .  The  two  initial  estimates  are  shown  in  Figure  2.  Figures  3  and  4  show 
multisignal  RESA  final  estimates  and  by  the  method  of  [5]  —independence  was  assumed 
for  the  final  joint  probability  densities  of  the  two  signals.  Figures  5  and  8  show  final  spectrum 
estimates  obtained  by  the  present  method  from  the  same  autocorrelation  data  and  initial  spec¬ 
trum  estimates.  The  initial  cross-spectrum  estimates  were  taken  to  be  zero  (P  was  diagonal).  No 
such  assumption  was  made  for  the  final  estimate,  of  course,  and  indeed  the  final  cross-spectrum 
estimates  (not  shown)  are  non-zero.  Figures  7  and  8  show  final  estimates  for  the  total  spectrum  of 
signal  plus  noise  by  the  methods  of  [5|  and  this  paper,  respectively.  Figure  7  is  just  the  sum  of 
Figures  3  and  4.  Figure  8  coincides  with  the  single-sign^  RESA  final  estimate  obtained  when 
Pu  +  Pm  is  ^tsed  as  the  initial  estimate;  it  is  not  the  sum  of  Figures  5  and  6,  since  it  includes 
contributions  from  the  cross-spectrum  estimates. 

In  the  results  for  both  methods,  the  signal  peak  shows  up  primarily  in  Q,, ,  but  some  evi¬ 
dence  of  it  is  in  Qu  as  well.  Comparison  of  Figures  5  and  6  with  Figures  3  and  4  shows  that 
both  final  spectrum  estimates  by  the  present  method  are  closer  to  the  respective  initial  estimates 
than  are  the  final  estimates  by  the  method  of  [5] 

In  view  of  the  fact  that  the  present  method  has  the  logically  more  satisfying  derivation  and 
is  computationally  cheaper,  the  comparison  of  Figure  6  with  Figure  4  is  somewhat  disappointing; 
the  signal  peak  shows  up  less  strongly  in  Figure  6.  It  must  be  pointed  out,  however,  that  in  this 
example  the  signal  and  noise  are  truly  uncorrelated.  Our  technique  does  not  use  this  information, 
and  in  fact  estimates  a  non-zero  cross-correlation  between  the  signal  and  noise.  The  method  in 
[5j,  however,  uses  this  additional  knowledge  and  therefore,  in  this  case,  is  able  to  produce  better 
estimates  than  our  technique. 


X.  DISCUSSION 


In  this  paper  we  have  derived  a  Multichannel  Relative-Entropy  Spectrum  Analysis  method 
that  estimates  the  power  spectra  and  cross-spectra  of  several  signals,  given  an  initial  estimate  of 
the  spectral  density  matrix  and  given  new  information  in  the  form  of  correlation  values  for  linear 
combinations  of  the  channels.  Both  this  method  and  the  multisignal  method  of  [5|  will  estimate 
the  power  spectra  of  a  signal  and  noise  when  prior  information  is  available  in  the  form  of  an  ini¬ 
tial  estimate  of  each  spectrum  and  given  selected  correlations  of  the  signal  plus  noise.  The 
present  method  can  accept  more  general  forms  of  correlation  data  and  also  produces  cross¬ 
spectrum  estimates,  which  are  implicitly  assumed  to  be  zero  in  [5].  Even  when  the  only  correla¬ 
tion  data  are  for  the  signal  plus  noise,  and  cross-spectrum  estimates  are  not  desired,  there  is  a 
persuasive  argument  for  preferring  the  present  method  to  that  of  [5]  —if  the  discrepancy  between 
the  given  correlation  values  and  those  computed  from  the  initial  estimates  can  be  accounted  for  in 
part  by  correlations  between  the  signal  and  noise,  then  the  correlation  data  should  be  regarded  as 
evidence  for  such  correlations,  and  correlated  final  estimates  should  be  produced. 

Estimates  by  the  present  method  are  considerably  more  economical  to  compute  than  esti¬ 
mates  by  the  method  of  [5].  The  algorithm  first  fits  a  smooth  model  power  spectrum  to  the  noisy 
signal  using  the  given  correlations.  The  available  prior  information  is  then  used  to  linearly  filter 
this  spectrum  estimate  in  order  to  obtain  separate  estimates  for  the  signal  and  the  noise.  This 
allocation  formula  is  virtually  identical  to  that  used  by  the  usual  Bayesian  formula  in  which  the 
signal  and  noise  power  spectra  are  estimated  from  the  observed  signal  plus  noise  spectrum.  The 
difference  between  the  Multichannel  RESA  and  Bayesian  methods  is  that  the  relative-entropy 
technique  starts  by  fitting  a  smooth  power  spectrum  model  to  the  observed  correlations,  while  the 
Bayesian  approach  starts  with  the  directly  observed  power  spectrum.  This  Multichannel 
Relative-Entropy  technique  thus  provides  a  smooth  model  fitting  spectrum  analysis  procedure 
that  is  closely  analogous  to  the  Bayesian  ^proach.  If  p  uniformly  spaced  correlations  are  given, 
and  if  the  prior  information  suggests  that  the  power  spectrum  of  the  signal  plus  noise  is  autore¬ 
gressive  of  order  at  most  p  ,  then  the  step  of  fitting  a  smooth  model  spectrum  to  the  noisy  signal 
is  identical  to  using  a  standard  MESA  algorithm  to  fit  a  smooth  autoregressive  model  to  the  given 
correlations.  We  concluded  by  searching  for  a  way  to  treat  the  case  where  no  prior  information 
about  the  signal  is  known.  Unfortunately,  we  were  not  able  to  find  a  theoretically  sound 
approach  with  desirable  characteristics. 

In  general,  the  method  presented  in  this  paper  yields  final  spectral  estimates  that  are  closer 
to  the  initial  estimates  than  those  of  [oj.  This  is  not  surprising.  Our  method  starts  with  an  initial 
estimate  of  the  signal  and  noise  spectra,  and  uses  correlations  of  the  signal  plus  noise  to  get  better 
power  spectra  estimates.  The  method  in  [5|  uses  the  same  information,  but  also  assumes  that  the 
signal  and  noise  are  uncorrelated.  This  additional  knowledge  further  restricts  the  constraint  space 
Q  in  which  the  probability  density  is  known  to  lie,  effectively  leaving  less  unknown  aspects  of  the 
density  to  estimate,  and  thus  improving  the  final  spectra.  In  general,  the  resulting  spectral  esti¬ 
mate  will  have  higher  relative  entropy  than  the  solution  from  our  method,  and  will  thus  be 
“farther”  from  the  initial  density  p  than  the  solution  from  our  method. 

Our  estimate  of  can  be  improved  by  observing  more  and  more  correlations  of  the 

signal  plus  noise.  Regardless  of  how  much  data  is  gathered,  however,  our  method  relies 
exclusively  on  the  initial  estimate  of  the  signal  and  noise  spectra  and  cross  spectra  to  allocate 
Qfi  (‘*'« )  between  the  signal,  noise,  and  cross  terms.  The  fundamental  difficulty  is  that  observing 
correlations  of  the  signal  plus  noise  gives  no  insight  into  how  this  observation  energy  should  be 
partitioned  between  the  signal  and  the  noise.  Achieving  accurate  estimation  of  the  signal  and 
noise  spectra  separately  requires  a  different  type  of  observation  data.  Learning  that  the  signal 
and  noise  are  uncorrelated  as  in  [5],  for  example,  will  improve  our  spectral  estimates,  as  would 
learning  the  exact  noise  power  spectrum.  The  best  solution,  of  course,  would  be  to  use  an  accu¬ 
rate  m^el  of  the  signal  and  noise  processes,  or  to  directly  observe  the  signal  and/ or  noise  correla¬ 
tions. 
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